External Memory Parallel Sorting by Sampling
نویسندگان
چکیده
This paper introduces an external memory parallel sorting algorithm in a multiprocessor architecture. The overall goal is to choose p − 1 partitioning elements so that the final p sorted files, one per processor, are of roughly equal size. It first determines a sample of splitters by either regular sampling or random sampling techniques. Then each data file at each processor is separated according to final splitters and sublists are redistributed to appropriate processors. Finally each processor sorts incoming records into runs and merges sorted runs into a fully sorted file. We implemented our algorithm using C and MPI package and tested its performance on both a cluster of SUN Solaries workstations and a Linux cluster CGM1. The result indicates that regular sampling provides better performance than random sampling does.
منابع مشابه
Parallel Sorting by Regular Sampling
A new parallel sorting algorithm suitable for MIMD multiprocessors is presented. The algorithm reduces memory and bus contention, which many parallel sorting algorithms suffer from, by using a regular sampling of the data to ensure good pivot selection. For n data elements to be sorted and p processors, when n ≥ p 3 the algorithm is shown to be asymptotically optimal. In theory, the algorithm i...
متن کاملAn efficient external sorting algorithm
This paper presents an optimal external sorting algorithm for two-level memory model. Our method is different from the traditional external merge sort and it uses the sampling information to reduce the disk I/Os in the external phase. The algorithm is efficient, simple and it makes a good use of memory available in the recent computer environment. Under the certain memory constraint, this algor...
متن کاملParallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting
We consider the problem of external sorting in a shared-nothing multiprocessor. A critical step in the algorithms we consider is to determine the range of sort keys to be handled by each processor. We consider two techniques for determining these ranges of sort keys: exact splitting, using a parallel version of the algorithm proposed by Iyer, Ricard, and Varman; and probabilistic splitting, whi...
متن کاملA Novel Approach to Parallel Sorting on Contemporary Architectures
We propose a parallel sorting algorithm that is better suited to today’s cluster based architectures. For large datasets, it turns out that a widely used algorithm such as sample sort might spend a considerable amount of time in the sampling phase. The sampling process itself requires “well chosen” parameters which yield “good samples”. Additionally, it requires an extra redistribution step at ...
متن کاملReducing I/O Complexity by Simulating Coarse Grained Parallel Algorithms
Block-wise access to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is fully parallel disk I/O. In this paper we present a deterministic simulation technique which transforms parallel algorithms into (parallel) external memory algorithms. Specifically, we present a deterministic simulation techniq...
متن کامل